Overview

Dataset statistics

Number of variables31
Number of observations1000000
Missing cells541771
Missing cells (%)1.7%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory236.5 MiB
Average record size in memory248.0 B

Variable types

NUM17
CAT10
BOOL4

Reproduction

Analysis started2020-07-13 08:50:30.526124
Analysis finished2020-07-13 09:01:21.679568
Duration10 minutes and 51.15 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

EVENTDATE has constant value "20190802" Constant
PROVIDER has constant value "1000" Constant
ORIGINATIONNETWORKID has constant value "1" Constant
MONTHID has constant value "201908" Constant
LOAD_ID has constant value "1" Constant
LOADDATE has constant value "03-AUG-19" Constant
CLASSIFICATION has a high cardinality: 1741 distinct values High cardinality
ROAMING_DETAILS has a high cardinality: 62 distinct values High cardinality
RELOADFACEVALUE is highly correlated with BALANCERELOADHigh correlation
BALANCERELOAD is highly correlated with RELOADFACEVALUEHigh correlation
HOME_CELLID is highly correlated with CELL_IDHigh correlation
CELL_ID is highly correlated with HOME_CELLIDHigh correlation
CELL_ID has 123307 (12.3%) missing values Missing
ROAMING_DETAILS has 108880 (10.9%) missing values Missing
PS_TYPE has 303324 (30.3%) missing values Missing
COSID is highly skewed (γ1 = 26.63298874) Skewed
SHORTCODEID is highly skewed (γ1 = 188.3301685) Skewed
TOT_CHARGED_AMT is highly skewed (γ1 = 47.05091688) Skewed
BALANCERELOAD is highly skewed (γ1 = 69.40334234) Skewed
NO_OF_EVENTS is highly skewed (γ1 = 428.6914845) Skewed
BONUS is highly skewed (γ1 = -100.6923622) Skewed
TOT_ROUNDED_VOL is highly skewed (γ1 = 38.09800078) Skewed
CELL_ID is highly skewed (γ1 = 211.7311456) Skewed
RELOADFACEVALUE is highly skewed (γ1 = 74.06165783) Skewed
HOME_CELLID is highly skewed (γ1 = 222.5416205) Skewed
ORIGINATING_COUNTRY_ID is highly skewed (γ1 = -121.6382015) Skewed
SHORTCODEID has 972545 (97.3%) zeros Zeros
TOT_CHARGED_AMT has 832314 (83.2%) zeros Zeros
BALANCERELOAD has 977344 (97.7%) zeros Zeros
TOT_ACTUAL_DURATION has 154746 (15.5%) zeros Zeros
BONUS has 997234 (99.7%) zeros Zeros
TOT_ROUNDED_VOL has 467895 (46.8%) zeros Zeros
RELOADFACEVALUE has 976275 (97.6%) zeros Zeros
DESTINATION_COUNTRY_ID has 750311 (75.0%) zeros Zeros

Variables

EVENTDATE
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
20190802
1000000
ValueCountFrequency (%) 
201908021000000100.0%
 

Length

Max length8
Median length8
Mean length8
Min length8

EVENT_LABEL
Real number (ℝ≥0)

Distinct count15
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.913653
Minimum1
Maximum300
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q125
median46
Q346
95-th percentile74
Maximum300
Range299
Interquartile range (IQR)21

Descriptive statistics

Standard deviation28.25856677
Coefficient of variation (CV)0.7261864304
Kurtosis7.065870671
Mean38.913653
Median Absolute Deviation (MAD)0
Skewness1.583467352
Sum38913653
Variance798.5465957
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4665949265.9%
 
118974219.0%
 
4464014.6%
 
25358713.6%
 
74331483.3%
 
139172011.7%
 
17847650.5%
 
17944640.4%
 
6840350.4%
 
13322550.2%
 
Other values (5)26260.3%
 
ValueCountFrequency (%) 
118974219.0%
 
248< 0.1%
 
4464014.6%
 
25358713.6%
 
4665949265.9%
 
ValueCountFrequency (%) 
300101< 0.1%
 
18226< 0.1%
 
17944640.4%
 
17847650.5%
 
139172011.7%
 

PROVIDER
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
1000
1000000
ValueCountFrequency (%) 
10001000000100.0%
 

Length

Max length4
Median length4
Mean length4
Min length4

COSID
Real number (ℝ≥0)

SKEWED

Distinct count60
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1068.848268
Minimum1005
Maximum7039
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum1005
5-th percentile1020
Q11021
median1070
Q31090
95-th percentile1090
Maximum7039
Range6034
Interquartile range (IQR)69

Descriptive statistics

Standard deviation220.0750486
Coefficient of variation (CV)0.2058992424
Kurtosis718.9617326
Mean1068.848268
Median Absolute Deviation (MAD)20
Skewness26.63298874
Sum1068848268
Variance48433.02703
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
107043228343.2%
 
109024819824.8%
 
102121343321.3%
 
1020770317.7%
 
112153030.5%
 
109249060.5%
 
104434980.3%
 
108428390.3%
 
107825970.3%
 
104514940.1%
 
Other values (50)84180.8%
 
ValueCountFrequency (%) 
1005196< 0.1%
 
10072< 0.1%
 
10109< 0.1%
 
10111< 0.1%
 
101218< 0.1%
 
ValueCountFrequency (%) 
7039383< 0.1%
 
70388< 0.1%
 
703767< 0.1%
 
703614< 0.1%
 
703522< 0.1%
 

MSISDN
Real number (ℝ≥0)

Distinct count628527
Unique (%)62.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean233328081205.61676
Minimum233200000005
Maximum233579999998
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum2.332e+11
5-th percentile2.332009404e+11
Q12.332048586e+11
median2.332089905e+11
Q32.335038607e+11
95-th percentile2.335088371e+11
Maximum2.3358e+11
Range379999993
Interquartile range (IQR)299002130

Descriptive statistics

Standard deviation146712907.4
Coefficient of variation (CV)0.0006287837564
Kurtosis-1.837075611
Mean2.333280812e+11
Median Absolute Deviation (MAD)7680246
Skewness0.3845085067
Sum2.333280812e+17
Variance2.152467719e+16
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.335006119e+1139< 0.1%
 
2.332023045e+1133< 0.1%
 
2.332082907e+1131< 0.1%
 
2.332053568e+1127< 0.1%
 
2.332468555e+1127< 0.1%
 
2.33203757e+1126< 0.1%
 
2.332776212e+1125< 0.1%
 
2.335073757e+1124< 0.1%
 
2.33202741e+1124< 0.1%
 
2.33206317e+1124< 0.1%
 
Other values (628517)999720> 99.9%
 
ValueCountFrequency (%) 
2.332e+113< 0.1%
 
2.332e+111< 0.1%
 
2.332e+111< 0.1%
 
2.332000001e+114< 0.1%
 
2.332000001e+111< 0.1%
 
ValueCountFrequency (%) 
2.3358e+111< 0.1%
 
2.335799699e+112< 0.1%
 
2.335799193e+111< 0.1%
 
2.335799123e+112< 0.1%
 
2.335798663e+111< 0.1%
 

ORIGINATIONNETWORKID
Boolean

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
1
1000000
ValueCountFrequency (%) 
11000000100.0%
 

DESTINATIONNETWORKID
Real number (ℝ≥0)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.130557
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum7
Range6
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.4967233802
Coefficient of variation (CV)0.4393616423
Kurtosis60.6926955
Mean1.130557
Median Absolute Deviation (MAD)0
Skewness6.560204839
Sum1130557
Variance0.2467341165
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
190263090.3%
 
2801778.0%
 
3100971.0%
 
440080.4%
 
727220.3%
 
6366< 0.1%
 
ValueCountFrequency (%) 
190263090.3%
 
2801778.0%
 
3100971.0%
 
440080.4%
 
6366< 0.1%
 
ValueCountFrequency (%) 
727220.3%
 
6366< 0.1%
 
440080.4%
 
3100971.0%
 
2801778.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
0
999905
1
 
95
ValueCountFrequency (%) 
0999905> 99.9%
 
195< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
N
835599
Y
 
164401
ValueCountFrequency (%) 
N83559983.6%
 
Y16440116.4%
 

SHORTCODEID
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count62
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.394336
Minimum0
Maximum233313
Zeros972545
Zeros (%)97.3%
Memory size7.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum233313
Range233313
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1026.78829
Coefficient of variation (CV)16.45643428
Kurtosis42609.3463
Mean62.394336
Median Absolute Deviation (MAD)0
Skewness188.3301685
Sum62394336
Variance1054294.192
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
097254597.3%
 
199199661.0%
 
199591770.9%
 
58048340.5%
 
611116620.2%
 
71118740.1%
 
134164< 0.1%
 
2000136< 0.1%
 
1906106< 0.1%
 
57078< 0.1%
 
Other values (52)458< 0.1%
 
ValueCountFrequency (%) 
097254597.3%
 
1001< 0.1%
 
1171< 0.1%
 
134164< 0.1%
 
1501< 0.1%
 
ValueCountFrequency (%) 
23331316< 0.1%
 
185552< 0.1%
 
71118740.1%
 
70071< 0.1%
 
611116620.2%
 

TOT_CHARGED_AMT
Real number (ℝ)

SKEWED
ZEROS

Distinct count7002
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09869795255
Minimum-5.0
Maximum199.0
Zeros832314
Zeros (%)83.2%
Memory size7.6 MiB

Quantile statistics

Minimum-5
5-th percentile0
Q10
median0
Q30
95-th percentile0.3115
Maximum199
Range204
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8969460429
Coefficient of variation (CV)9.087787738
Kurtosis6086.053508
Mean0.09869795255
Median Absolute Deviation (MAD)0
Skewness47.05091688
Sum98697.95255
Variance0.8045122039
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
083231483.2%
 
0.15194021.9%
 
0.367310.7%
 
0.449520.5%
 
0.239170.4%
 
237800.4%
 
0.637690.4%
 
0.2536450.4%
 
0.112531110.3%
 
0.3527730.3%
 
Other values (6992)11560611.6%
 
ValueCountFrequency (%) 
-51< 0.1%
 
083231483.2%
 
1e-052< 0.1%
 
2e-052< 0.1%
 
3e-052< 0.1%
 
ValueCountFrequency (%) 
1992< 0.1%
 
1003< 0.1%
 
95.61< 0.1%
 
58.351< 0.1%
 
50.052< 0.1%
 

BALANCERELOAD
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count177
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09275834200000001
Minimum0.0
Maximum300.0
Zeros977344
Zeros (%)97.7%
Memory size7.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum300
Range300
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.205215955
Coefficient of variation (CV)12.99307349
Kurtosis11103.3117
Mean0.092758342
Median Absolute Deviation (MAD)0
Skewness69.40334234
Sum92758.342
Variance1.452545498
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
097734497.7%
 
286810.9%
 
531520.3%
 
127180.3%
 
2.812640.1%
 
1010370.1%
 
0.89740.1%
 
1.89050.1%
 
206200.1%
 
4463< 0.1%
 
Other values (167)28420.3%
 
ValueCountFrequency (%) 
097734497.7%
 
0.117< 0.1%
 
0.231< 0.1%
 
0.320< 0.1%
 
0.355< 0.1%
 
ValueCountFrequency (%) 
3001< 0.1%
 
268.021< 0.1%
 
2252< 0.1%
 
2001< 0.1%
 
1502< 0.1%
 

TOT_ACTUAL_DURATION
Real number (ℝ≥0)

ZEROS

Distinct count52373
Unique (%)5.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4583.99856
Minimum0
Maximum178262
Zeros154746
Zeros (%)15.5%
Memory size7.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q129
median476
Q33721
95-th percentile24128
Maximum178262
Range178262
Interquartile range (IQR)3692

Descriptive statistics

Standard deviation10589.30818
Coefficient of variation (CV)2.310059229
Kurtosis22.00700643
Mean4583.99856
Median Absolute Deviation (MAD)476
Skewness4.181342138
Sum4583998560
Variance112133447.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
015474615.5%
 
155230.6%
 
352430.5%
 
249480.5%
 
448630.5%
 
542820.4%
 
3036690.4%
 
1035430.4%
 
634950.3%
 
1534860.3%
 
Other values (52363)80620280.6%
 
ValueCountFrequency (%) 
015474615.5%
 
155230.6%
 
249480.5%
 
352430.5%
 
448630.5%
 
ValueCountFrequency (%) 
1782621< 0.1%
 
1763671< 0.1%
 
1701441< 0.1%
 
1594511< 0.1%
 
1580341< 0.1%
 

NO_OF_EVENTS
Real number (ℝ≥0)

SKEWED

Distinct count222
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.493332
Minimum1
Maximum10241
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile3
Maximum10241
Range10240
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17.6903337
Coefficient of variation (CV)11.84621618
Kurtosis213072.4914
Mean1.493332
Median Absolute Deviation (MAD)0
Skewness428.6914845
Sum1493332
Variance312.9479065
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
180791180.8%
 
211878811.9%
 
3360083.6%
 
4154401.5%
 
575300.8%
 
643340.4%
 
724840.2%
 
816290.2%
 
911320.1%
 
107870.1%
 
Other values (212)39570.4%
 
ValueCountFrequency (%) 
180791180.8%
 
211878811.9%
 
3360083.6%
 
4154401.5%
 
575300.8%
 
ValueCountFrequency (%) 
102411< 0.1%
 
89811< 0.1%
 
71431< 0.1%
 
47591< 0.1%
 
30631< 0.1%
 

BONUS
Real number (ℝ)

SKEWED
ZEROS

Distinct count195
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-2093.455698
Minimum-30000000
Maximum550500
Zeros997234
Zeros (%)99.7%
Memory size7.6 MiB

Quantile statistics

Minimum-30000000
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum550500
Range30550500
Interquartile range (IQR)0

Descriptive statistics

Standard deviation88311.6292
Coefficient of variation (CV)-42.18461814
Kurtosis19338.68702
Mean-2093.455698
Median Absolute Deviation (MAD)0
Skewness-100.6923622
Sum-2093455698
Variance7798943853
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
099723499.7%
 
-100000306< 0.1%
 
-25600295< 0.1%
 
-200000218< 0.1%
 
-1048576205< 0.1%
 
-500000158< 0.1%
 
-81920157< 0.1%
 
-307200140< 0.1%
 
-5242880118< 0.1%
 
-460800112< 0.1%
 
Other values (185)10570.1%
 
ValueCountFrequency (%) 
-300000001< 0.1%
 
-157286402< 0.1%
 
-134010001< 0.1%
 
-120000001< 0.1%
 
-110000001< 0.1%
 
ValueCountFrequency (%) 
5505001< 0.1%
 
3176001< 0.1%
 
2889001< 0.1%
 
2829501< 0.1%
 
2643501< 0.1%
 

TOT_ROUNDED_VOL
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count380994
Unique (%)38.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9121522.18075
Minimum0
Maximum10815727200
Zeros467895
Zeros (%)46.8%
Memory size7.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median3091
Q3308130.75
95-th percentile32983308.9
Maximum1.08157272e+10
Range1.08157272e+10
Interquartile range (IQR)308130.75

Descriptive statistics

Standard deviation72578509.72
Coefficient of variation (CV)7.956841882
Kurtosis2946.746799
Mean9121522.181
Median Absolute Deviation (MAD)3091
Skewness38.09800078
Sum9.121522181e+12
Variance5.267640073e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
046789546.8%
 
14012160.1%
 
1296640.1%
 
100389< 0.1%
 
123270< 0.1%
 
176240< 0.1%
 
131233< 0.1%
 
152215< 0.1%
 
300210< 0.1%
 
216200< 0.1%
 
Other values (380984)52846852.8%
 
ValueCountFrequency (%) 
046789546.8%
 
281< 0.1%
 
4031< 0.1%
 
411< 0.1%
 
424< 0.1%
 
ValueCountFrequency (%) 
1.08157272e+101< 0.1%
 
97002074921< 0.1%
 
92383483661< 0.1%
 
88486400001< 0.1%
 
80521728001< 0.1%
 

CELL_ID
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
SKEWED

Distinct count23657
Unique (%)2.7%
Missing123307
Missing (%)12.3%
Infinite0
Infinite (%)0.0%
Mean28846.459548553485
Minimum1.0
Maximum9835523.0
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum1
5-th percentile11238
Q117660
median26743
Q336084
95-th percentile61661
Maximum9835523
Range9835522
Interquartile range (IQR)18424

Descriptive statistics

Standard deviation40425.06396
Coefficient of variation (CV)1.401387366
Kurtosis51352.17727
Mean28846.45955
Median Absolute Deviation (MAD)9249
Skewness211.7311456
Sum2.528948916e+10
Variance1634185796
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
65535103801.0%
 
28728640.1%
 
234988290.1%
 
134988090.1%
 
242195920.1%
 
142195650.1%
 
286405120.1%
 
234995080.1%
 
43498466< 0.1%
 
13499465< 0.1%
 
Other values (23647)86070386.1%
 
(Missing)12330712.3%
 
ValueCountFrequency (%) 
117< 0.1%
 
38< 0.1%
 
44< 0.1%
 
5193< 0.1%
 
711< 0.1%
 
ValueCountFrequency (%) 
98355236< 0.1%
 
98355217< 0.1%
 
1933821< 0.1%
 
1933731< 0.1%
 
1929921< 0.1%
 

CLASSIFICATION
Categorical

HIGH CARDINALITY

Distinct count1741
Unique (%)0.2%
Missing14
Missing (%)< 0.1%
Memory size7.6 MiB
VOICE
170890
500
140413
100
138313
1016
111466
1010
102633
Other values (1736)
336271
ValueCountFrequency (%) 
VOICE17089017.1%
 
50014041314.0%
 
10013831313.8%
 
101611146611.1%
 
101010263310.3%
 
1011940009.4%
 
SMS358713.6%
 
1018350463.5%
 
1017235612.4%
 
Kirusa189001.9%
 
Other values (1731)12889312.9%
 

Length

Max length19
Median length4
Mean length4.873047
Min length2

PROMOTION_CD
Categorical

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
-1
936395
0
 
58708
201
 
4247
YDB
 
418
VCB
 
73
Other values (6)
 
159
ValueCountFrequency (%) 
-193639593.6%
 
0587085.9%
 
20142470.4%
 
YDB418< 0.1%
 
VCB73< 0.1%
 
PWD67< 0.1%
 
W4D49< 0.1%
 
ETB40< 0.1%
 
ARB1< 0.1%
 
RFS1< 0.1%
 

Length

Max length3
Median length2
Mean length1.946189
Min length1

PEAKID
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
1
798959
2
 
89263
5
 
61677
3
 
50101
ValueCountFrequency (%) 
179895979.9%
 
2892638.9%
 
5616776.2%
 
3501015.0%
 

Length

Max length1
Median length1
Mean length1
Min length1

RELOADFACEVALUE
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct count118
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.09741394199999996
Minimum0.0
Maximum300.0
Zeros976275
Zeros (%)97.6%
Memory size7.6 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum300
Range300
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.159820336
Coefficient of variation (CV)11.90610207
Kurtosis12783.31833
Mean0.097413942
Median Absolute Deviation (MAD)0
Skewness74.06165783
Sum97413.942
Variance1.345183212
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
097627597.6%
 
2102681.0%
 
545980.5%
 
133490.3%
 
1015360.2%
 
313240.1%
 
4444< 0.1%
 
20437< 0.1%
 
7270< 0.1%
 
6244< 0.1%
 
Other values (108)12550.1%
 
ValueCountFrequency (%) 
097627597.6%
 
0.19< 0.1%
 
0.226< 0.1%
 
0.36< 0.1%
 
0.48< 0.1%
 
ValueCountFrequency (%) 
3001< 0.1%
 
268.021< 0.1%
 
2252< 0.1%
 
2001< 0.1%
 
1502< 0.1%
 

IMSI
Real number (ℝ≥0)

Distinct count625892
Unique (%)62.9%
Missing5363
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean620020522424545.0
Minimum620020120000065.0
Maximum620020549438651.0
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum6.2002012e+14
5-th percentile6.200205006e+14
Q16.200205203e+14
median6.20020531e+14
Q36.200205406e+14
95-th percentile6.200205457e+14
Maximum6.200205494e+14
Range429438586
Interquartile range (IQR)20296428

Descriptive statistics

Standard deviation46480919
Coefficient of variation (CV)7.496674274e-08
Kurtosis43.49822939
Mean6.200205224e+14
Median Absolute Deviation (MAD)10508568
Skewness-6.019069216
Sum6.166953524e+20
Variance2.160475831e+15
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
6.20020543e+1439< 0.1%
 
6.200205238e+1433< 0.1%
 
6.200205011e+1431< 0.1%
 
6.20020536e+1427< 0.1%
 
6.200205386e+1427< 0.1%
 
6.200205299e+1426< 0.1%
 
6.200205267e+1425< 0.1%
 
6.200205306e+1424< 0.1%
 
6.200204104e+1424< 0.1%
 
6.200205235e+1424< 0.1%
 
Other values (625882)99435799.4%
 
(Missing)53630.5%
 
ValueCountFrequency (%) 
6.2002012e+144< 0.1%
 
6.2002012e+142< 0.1%
 
6.2002012e+141< 0.1%
 
6.2002012e+141< 0.1%
 
6.2002012e+141< 0.1%
 
ValueCountFrequency (%) 
6.200205494e+141< 0.1%
 
6.200205494e+141< 0.1%
 
6.200205494e+141< 0.1%
 
6.200205494e+141< 0.1%
 
6.200205494e+142< 0.1%
 

MONTHID
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
201908
1000000
ValueCountFrequency (%) 
2019081000000100.0%
 

Length

Max length6
Median length6
Mean length6
Min length6

LOAD_ID
Boolean

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
1
1000000
ValueCountFrequency (%) 
11000000100.0%
 

LOADDATE
Categorical

CONSTANT
REJECTED

Distinct count1
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
03-AUG-19
1000000
ValueCountFrequency (%) 
03-AUG-191000000100.0%
 

Length

Max length9
Median length9
Mean length9
Min length9

CLASSIFICATION2
Categorical

Distinct count22
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.6 MiB
GPRS
659272
VOICE
189790
SMS
 
35871
MT_REVENUE
 
33148
THIRD_PARTY_DEDUCTION
 
20811
Other values (17)
 
61108
ValueCountFrequency (%) 
GPRS65927265.9%
 
VOICE18979019.0%
 
SMS358713.6%
 
MT_REVENUE331483.3%
 
THIRD_PARTY_DEDUCTION208112.1%
 
BUNDLE_SUBSCRIPTION180461.8%
 
USSD_RELOAD158841.6%
 
SOS_TOPUP47650.5%
 
SOS_PAYMENT44640.4%
 
CRBT_REVENUE39350.4%
 
Other values (12)140141.4%
 

Length

Max length21
Median length4
Mean length5.294834
Min length3

ROAMING_DETAILS
Categorical

HIGH CARDINALITY
MISSING

Distinct count62
Unique (%)< 0.1%
Missing108880
Missing (%)10.9%
Memory size7.6 MiB
233200005
231532
080.087.092.020
138251
080.087.092.022
136793
080.087.092.028
120836
080.087.092.030
120444
Other values (57)
143264
ValueCountFrequency (%) 
233200005 23153223.2%
 
080.087.092.020 13825113.8%
 
080.087.092.022 13679313.7%
 
080.087.092.028 12083612.1%
 
080.087.092.030 12044412.0%
 
080.087.092.091 519475.2%
 
080.087.092.111 512725.1%
 
080.087.092.121 100331.0%
 
080.087.092.102 100281.0%
 
080.087.092.122 99431.0%
 
Other values (52)100411.0%
 
(Missing)10888010.9%
 

Length

Max length32
Median length32
Mean length28.84248
Min length3

HOME_CELLID
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct count23735
Unique (%)2.4%
Missing883
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean28491.94235710132
Minimum1.0
Maximum9835523.0
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum1
5-th percentile11332
Q117496
median26467
Q335526
95-th percentile61317
Maximum9835523
Range9835522
Interquartile range (IQR)18030

Descriptive statistics

Standard deviation38065.96932
Coefficient of variation (CV)1.336025773
Kurtosis57320.55552
Mean28491.94236
Median Absolute Deviation (MAD)9038
Skewness222.5416205
Sum2.846678397e+10
Variance1449018021
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
65535103801.0%
 
28728660.1%
 
234988310.1%
 
134988120.1%
 
242196000.1%
 
286405900.1%
 
142195770.1%
 
253565140.1%
 
234995080.1%
 
28605474< 0.1%
 
Other values (23725)98296598.3%
 
(Missing)8830.1%
 
ValueCountFrequency (%) 
117< 0.1%
 
38< 0.1%
 
44< 0.1%
 
5193< 0.1%
 
711< 0.1%
 
ValueCountFrequency (%) 
98355236< 0.1%
 
98355217< 0.1%
 
1933821< 0.1%
 
1933731< 0.1%
 
1929921< 0.1%
 

ORIGINATING_COUNTRY_ID
Real number (ℝ≥0)

SKEWED

Distinct count25
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1125.929182
Minimum1
Maximum1126
Zeros0
Zeros (%)0.0%
Memory size7.6 MiB

Quantile statistics

Minimum1
5-th percentile1126
Q11126
median1126
Q31126
95-th percentile1126
Maximum1126
Range1125
Interquartile range (IQR)0

Descriptive statistics

Standard deviation8.089548829
Coefficient of variation (CV)0.007184775879
Kurtosis15293.33835
Mean1125.929182
Median Absolute Deviation (MAD)0
Skewness-121.6382015
Sum1125929182
Variance65.44080025
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1126999904> 99.9%
 
46014< 0.1%
 
103413< 0.1%
 
409< 0.1%
 
4269< 0.1%
 
418< 0.1%
 
4156< 0.1%
 
175< 0.1%
 
10054< 0.1%
 
344< 0.1%
 
Other values (15)24< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
141< 0.1%
 
153< 0.1%
 
175< 0.1%
 
241< 0.1%
 
ValueCountFrequency (%) 
1126999904> 99.9%
 
103413< 0.1%
 
10061< 0.1%
 
10054< 0.1%
 
9922< 0.1%
 

DESTINATION_COUNTRY_ID
Real number (ℝ)

ZEROS

Distinct count154
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean224.897212
Minimum-1
Maximum1126
Zeros750311
Zeros (%)75.0%
Memory size7.6 MiB

Quantile statistics

Minimum-1
5-th percentile0
Q10
median0
Q30
95-th percentile1126
Maximum1126
Range1127
Interquartile range (IQR)0

Descriptive statistics

Standard deviation447.7535836
Coefficient of variation (CV)1.990925453
Kurtosis0.2774131595
Mean224.897212
Median Absolute Deviation (MAD)0
Skewness1.507474895
Sum224897212
Variance200483.2717
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
075031175.0%
 
112619265319.3%
 
-1240772.4%
 
40201062.0%
 
10949770.5%
 
99521590.2%
 
101116620.2%
 
11038860.1%
 
10945280.1%
 
18479< 0.1%
 
Other values (144)21620.2%
 
ValueCountFrequency (%) 
-1240772.4%
 
075031175.0%
 
43< 0.1%
 
1243< 0.1%
 
1438< 0.1%
 
ValueCountFrequency (%) 
112619265319.3%
 
11241< 0.1%
 
11152< 0.1%
 
11131< 0.1%
 
11121< 0.1%
 

PS_TYPE
Categorical

MISSING

Distinct count3
Unique (%)< 0.1%
Missing303324
Missing (%)30.3%
Memory size7.6 MiB
1
486191
0
170417
6
 
40068
ValueCountFrequency (%) 
148619148.6%
 
017041717.0%
 
6400684.0%
 
(Missing)30332430.3%
 

Length

Max length3
Median length3
Mean length3
Min length3

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

EVENTDATEEVENT_LABELPROVIDERCOSIDMSISDNORIGINATIONNETWORKIDDESTINATIONNETWORKIDROAMING_FLAGBILLED_FLAGSHORTCODEIDTOT_CHARGED_AMTBALANCERELOADTOT_ACTUAL_DURATIONNO_OF_EVENTSBONUSTOT_ROUNDED_VOLCELL_IDCLASSIFICATIONPROMOTION_CDPEAKIDRELOADFACEVALUEIMSIMONTHIDLOAD_IDLOADDATECLASSIFICATION2ROAMING_DETAILSHOME_CELLIDORIGINATING_COUNTRY_IDDESTINATION_COUNTRY_IDPS_TYPE
0201908024610001070233204922078110N00.000.01341110554429754358.0100-120.06.200205e+14201908103-AUG-19GPRS080.087.092.02054358.0112601.0
1201908024610001021233203520024110N00.000.0830065535.0100-110.06.200205e+14201908103-AUG-19GPRS080.087.092.11165535.0112601.0
22019080213910001090233500157746110Y017.990.00100NaNBDLYOUTH1MTLY-110.06.200205e+14201908103-AUG-19BUNDLE_SUBSCRIPTIONNaN29776.011260NaN
32019080213910001070233204426836110Y05.000.00100NaNDATABUNDDR5WLNR-110.06.200205e+14201908103-AUG-19BUNDLE_SUBSCRIPTIONNaN34185.011260NaN
420190802410001020233202935215110Y00.100.00100NaNTONERNW@crbtuser010.06.200205e+14201908103-AUG-19CRBT_REVENUENaN16630.011260NaN
520190802410001021233206498686110Y00.030.00100NaNSUB@crbtuser030.06.200205e+14201908103-AUG-19CRBT_REVENUENaN15612.011260NaN
620190802410001090233502959393110Y00.100.00100NaNSTATUS@crbtuser050.06.200205e+14201908103-AUG-19CRBT_REVENUENaN37710.011260NaN
720190802410001021233202471084110Y00.150.00100NaNSUB@crbtuser2010.06.200204e+14201908103-AUG-19CRBT_REVENUENaN24073.011260NaN
8201908024610001090233243585839110N00.000.04081010292165535.01017-110.06.200205e+14201908103-AUG-19GPRS080.087.092.09165535.0112601.0
9201908024610001090233500481791110N00.000.073510512839017.0500-150.06.200205e+14201908103-AUG-19GPRS080.087.092.02839017.0112601.0

Last rows

EVENTDATEEVENT_LABELPROVIDERCOSIDMSISDNORIGINATIONNETWORKIDDESTINATIONNETWORKIDROAMING_FLAGBILLED_FLAGSHORTCODEIDTOT_CHARGED_AMTBALANCERELOADTOT_ACTUAL_DURATIONNO_OF_EVENTSBONUSTOT_ROUNDED_VOLCELL_IDCLASSIFICATIONPROMOTION_CDPEAKIDRELOADFACEVALUEIMSIMONTHIDLOAD_IDLOADDATECLASSIFICATION2ROAMING_DETAILSHOME_CELLIDORIGINATING_COUNTRY_IDDESTINATION_COUNTRY_IDPS_TYPE
999990201908024610001090233509088971110N00.000000.036910589055823330.0100-110.06.200205e+14201908103-AUG-19GPRS080.087.092.03023330.0112600.0
999991201908024610001070233501740319110N00.000000.04210890431239.0100-110.06.200205e+14201908103-AUG-19GPRS080.087.092.02831239.0112600.0
999992201908024610001090233503791094110N00.000000.05216703178767038762.0100-110.06.200205e+14201908103-AUG-19GPRS080.087.092.02838762.0112601.0
999993201908024610001021233209494500110N00.000000.01810970214098.01016-110.06.200205e+14201908103-AUG-19GPRS080.087.092.02014098.0112600.0
999994201908024610001070233247981441110N00.000000.040810031099.01010-110.06.200205e+14201908103-AUG-19GPRS080.087.092.02031099.0112600.0
999995201908024610001090233208673711110N00.000000.02123820447225525113.01016-110.06.200205e+14201908103-AUG-19GPRS080.087.092.1015113.0112606.0
999996201908024610001021233508378181110N00.000000.04921026782314581.01018-150.06.200205e+14201908103-AUG-19GPRS080.087.092.02214581.0112601.0
999997201908024610001070233205754189110N00.000000.01581042812013905.0100-110.06.200205e+14201908103-AUG-19GPRS080.087.092.02013905.0112601.0
999998201908024610001090233200737695110N00.000000.0202931018283913641.0500-110.06.200205e+14201908103-AUG-19GPRS080.087.092.09113641.0112601.0
99999920190802110001070233507696654120Y00.557330.020930063908.0VOICE-110.06.200205e+14201908103-AUG-19VOICE23320000563908.011261126NaN